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Abstract 

In this paper, we show how a complete and exact Bayesian analysis of a parametric mixture model is possible 
in some cases when components of the mixture are taken from exponential families and when conjugate priors 
are used. This restricted set-up allows us to show the relevance of the Bayesian approach as well as to exhibit 
the limitations of a complete analysis, namely that it is impossible to conduct this analysis when the sample 
size is too large, when the data are not from an exponential family, or when priors that are more complex 
"q . than conjugate priors are used. 
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1 Introduction 



As a warning to the reader, we want to stress from the beginning that this paper is mostly a formal 
exercise: to understand how the Bayesian analysis of a mixture model unravels and automatically 
exploits the missing data structure of the model is crucial for grasping the details of simulation 
methods (not covered in this paper, see, e.g., lEobert and Casellall20ollLee et alE oOQ) that take full 



advantage of the missing structures. It also allows for a comparison between exact and approximate 
techniques when the former are available. While the relevant referen ces are poin t ed ou t in due time, 



we note here that our paper builds upon the foundational paper of iFearnheadl (|2005l ). 



We thus assume that a sample x = (xi, . . . , x n ) from the mixture model 



m 

Y,Pih(x) exp{e i -R(x)-^(e i )} (1) 
i=l 

is available, where 8 ■ R(x) denotes the scalar product between the vectors 9 and R(x). We are 



selecting on purpose the natural representation of an exponential family (see, e.g. iRobertl . 12001 
Chapter 3), in order to facilitate the subsequent derivation of the posterior distribution. 

When the components of the mixture are Poisson V(Xi) distributions, if we define 9{ = log A 
the Poisson distribution indeed is written as a natural exponential family: 



f(x\0i) = (1/xl) expj^x-e^} 



For a mixture of multinomial distributions Ai(m;qn, . . . ,qi v ), the natural representation is given 

by 

f(x\6i) = (m\/xi\ ■ ■ ■ x v \) exp (an log qn H \-x v logq iv ) 

and the overall (natural) parameter is thus 6i = (log qn, . . . , log 

In the normal N(ni, erf) case, the derivation is more delicate when both parameters are unknown 
since 

f( x \ B .) - 1 exo + ^ + Z^L 



1 
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In this particular setting, the natural parameterisation is in 0, = (ni/af, while the statistic 

R{x) = (x, —x 2 /2) is two-dimensional. The moment cumulant function is then \E'(#) = #i/#2- 



2 Formal derivation of the posterior distribution 
2.1 Locally conjugate priors 



As described in the standard litera ture on mixture estimation (jDempster et al.l . ll977l . lMacLachlan and Peell . 
200fj| . iFriihwirth-Schnatterl . l2006h . the missing variable decomposition of a mixture likelihood as- 
sociates each observation in the sample with one of the k components of the mixture ([1]), i.e. 

Xi\zi ~ f(xi\9 Zj ) . 

Given the component allocations z, we end up with a cluster of (sub)samples from different dis- 
tributions from the same exponential family. Priors customarily used for the analysis of these 
exponential families can therefore be extended to the mixtures as well. 

While conjugate priors do not formally exist for mixtures of exponential families, we will define 
locally conjugate priors as priors that are conjugate for the completed distribution, that is, for the 
likelihood associated with both the observations and the missing data z. This amounts to taking 
regular conjugate priors for the parameters of the different components and a conjugate Dirichlet 
prior on the weights of the mixture, 



(pi,...,Pk) ~£>(ai, 
When we consider the complete likelihood 



. Ctk) 



L c (#,p|x, z) = f[p Zi exp [9 Zi ■ R{xi) - 



i=l 
k 

Y\ Pj 3 exp 

j=i |_ Zi=J 

k 



'jJi > 



it is easily seen that we remain within an exponential family since there exists a sufficient statistic 
with fixed dimension, (m, S\, . . . , n&, Sk). If we use a Dirichlet prior, 



T(ai + ... + a k ) ai _i 
7r(pi, . . . ,p k ) = -FT - ^ , Pi 1 



Oik — 1 

■Pk 



T(ai) • • • T(a k ) 

on the vector of the weights (pi, . . . ,pk) defined on the simplex of and (generic) conjugate 
priors on the OjS, 

7Tj(0j] oc exp [9j ■ SQj — \jfy(9j)] , 

the posterior associated with the complete likelihood L c (9,p\x, z) is then of the same family as the 
prior: 

7r(0,p|x,z) oc tv(9,p) x L c (9, p|x,z) 
k 

oc \[i>y : exp [Oj ■ s 0j - X^{9j)} x p'y exp [Oj ■ Sj - n^{9j)\ 

3=1 
k 

= Up?^' 1 ex P & ■ ^ + s j) ~ ^ + ; 
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the parameters of the prior are transformed from ay to ay + rij , from soj to s^j + Sj and from Aj 
into Xj + rij . 

For instance, in the case of the Poisson mixture, the conjugate priors are Gamma G(a,j, bj), with 
corresponding posteriors (for the complete likelihood), Gamma G[a,j + Sj, bj + rij) distributions, in 
which Sj denotes the sum of the observations in the jth group. 

For a mixture of multinomial distributions, M(m; qji,..., qj v ), the conjugate priors are Dirich- 
let V v (f3ji, . . . , (3j V ) distributions, with corresponding posteriors V v (/3ji + Sj\+, . . . , f3j v + Sj v ), Sj u 
denoting the number of observations from component j in group u (1 < u < v), with Y2 U s jv = rijiri. 

In the normal mixture case, the standard conjugate priors are products of normal and inverse 
gamma distributions, i.e. 

[ij\(jj ~ J^(Cj,crj/cj) and aj 2 ~ Q(aj/2,bj/2) . 

Indeed, the corresponding posterior is 

[ij\oj ~ M{{cjij + rijXj,^/ (cj +rij)) 

and 

of ~ G{{a 3 + nj}/2, {bj + n 3 a) + (xj - ^/(cj 1 + nj 1 )}) , 

where rijXj is the sum of the observations allocated to component j and nj& 2 is the sum of the 
squares of the differences from Xj for the same group (with the convention that njd 2 = when 
rij = 0). 



2.2 True posterior distributions 

These straightforward derivations do not correspond to the observed likelihood, but to the com- 
pleted lik elihood. While this may be eno ugh for some simulation methods like Gibbs sampling 
(see, e.g. biebolt and Robertl . Il990l . Il994h . we need further developments for obtaining the true 



posterior distribution. 

If we now consider the observed likelihood, it is natural to expand this likelihood as a sum of 
completed likelihoods over all possible configurations of the partition space of allocations, that is, 
a sum over k n terms. Except in the very few cases that are processed below, including Poisson and 
multinomial mixtures (see Section \2.3\i . this sum does not simplify into a smaller number of terms 
because there exists no summary statistics. From a Bayesian point of view, the complexity of the 
model is therefore truly of magnitude 0(k n ). 

The observed likelihood is thus 

k 

z j=l 

(with the dependence of (rij,Sj) upon z omitted for notational purposes) and the associated pos- 
terior is, up to a constant, 

k 

^11/'.'; ' exp {9j ■ ( Soj + Sj) - {nj + A,-)*^)} 

z j=l 

= w(z)7r(0,p|x,z) , 

z 

where to(z) is the normalising constant missing in 

k 

Hp^- 1 exp {Oj ■ (s 0j + Sj) - (rij + Xj)^)} 

3=1 
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i.e. 

w(z) cx ■ x || K(s j + S/.nj- + Xj) , 

if if (£, <5) is the normalising constant of exp {9j ■ £ — 5^(6j)}, i.e. 

K(C,S) = J exp {9 j d0. 

The posterior ^ z w(z) 7r(#, p|x, z) is therefore a mixture of conjugate posteriors where the param- 
eters of the components as well as the weights can be computed in closed form! The availability 
of the posterior does not mean that alternative estimates like MAP and MMAP estimates can be 
computed easily. However, this is a useful closed form result in the se nse that moments can be com- 
puted exactly: for instance, if there is no label switching problem ( Stephens! . 2000bl . Jasra et al. 



20051 ) and, if the posterior mean is producing meaningful estimates, we have that 



E[v^ J oix] = E-( z ) f22 ^ ± r L 

rij -+- Aj , 



since, for each allocation vector z, we are in an exponential family set-up where the posterior 
mean of the expectation ^(9) of R(x) is available in closed form. ( Obviously, the pos terior mean 
only makes sense as an estimate for very discriminative priors; see I.Tasra et al.l l200fJ . ) Similarly, 
estimates of the weights pj are given by 



E \pj\x] = V w (z) 



\ - , , rij + aj 

> UMZ) — - 

^ n + a. 

z 



where a. = Y2j a j- Therefore, the only computational effort required is the summation over all 
partitions. 

This decomposition further allows for a closed form expression of the marginal distributions of 
the various parameters of the mixture. For instance, the (marginal) posterior distribution of 9{ is 
given by 

exp [Oj ■ (s 0j + Sj)- (nj + X j )^(9 j )] 



E 



oj z 



K(s j + 3j,rij + Aj) 



(Note that, when the hyperparameters aj, soj, and rij are independent of j, this posterior distri- 
bution is independent of j.) Similarly, the posterior distribution of the vector (p%, . . . ,pk) is equal 
to 

oj(z) V{m + a 1: . . . , n k + a k ) . 

z 

If k is small and n is large, and when all hyperparameters are equal, the posterior should then have 
k spikes or peaks, due to the label switching / lack of identifiability phenomenon. 
We will now proceed through standard examples. 



2.3 Poisson mixture 

In the case of a two component Poisson mixture, 

let us assume a uniform prior on p (i.e. a\ = «2 = 1) and exponential priors £xp{l) and £xp(l/10) 
on Ai and A2, respectively. (The scales are chosen to be fairly different for the purpose of illustration. 
In a realistic setting, it would be sensible either to set those scales in terms of the scale of the 
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prob lem, if known, or to estimate the global scale following the procedure of lMengersen and Robert 
19961 .1 



The normalising constant is then equal to 



exp [0jf-tf]og(0j-)] d9 



-oo 

CO 



A^ 1 exp(— 5Xj) d\j 



with soi = 1 and S02 = 10, and the corresponding posterior is (up to the normalisation of the 
weights) 

2 

H T( nj + l)r(l + Sfi/iaoj + n s ) s s +x 

Yl ~ ~ — a — ; ^ p ' x ' z ) 



r(2 + E=i%- 



E 



UnjlSjl/isoj+rijfi 



Si+l 



3=1 



(N + l) 



■7r((9,p|x,z) 



v E II n i ! 5 i ! /( s 0j + nj) s ' +1 it(6, p|x, z) , 

z j = l 

with 7r(0, p|x, z)) corresponding to a Beta Be(l + rij,l + N — nj) distribution on p and to a Gamma 
Qa{Sj + 1, soj + rtj) distribution on Aj (j = 1, 2). 

An important feature of this example is that the sum does not need to involve all of the 
2 n terms, simply because the individual terms in the previous sum factorise in (rai, ri2, Si, S2), 
which then acts like a local sufficient statistic. Since rti = n — n\ and S2 = Yl x i ~ ^l) the 
posterior only requires as many distinct terms as there are distinct values of the pair (ni,S±) 
in the completed sample. For instance, if the sample is (0,0,0,1,2,2,4), the distinct values 
of the pair (m.Si) are (0, 0), (1, 0), (1, 1), (1, 2), (1, 4), (2, 0), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), 
. . . , (6, 5), (6, 7), (6, 8), (7, 9). There are therefore 41 distinct terms in the posterior, rather than 
2 s = 256. 

The problem of computing the number (or c ardinality) UxJni , Si) of terms in the k n sum with 
the same statistic (ni,Si) has been tackled by iFearnheadl (|2005h in that he proposes a recursive 
formula for computing (J,(nx, S±) in an efficient way, as expressed below for a k component mixture: 

Theorem 1: ( Fearnheadl . 20051 ) If denotes the vector of length k made up of zeros everywhere 
except at component j where it is equal to one, if 



n = (m, ...,n k ) 



n 



(ni, 



l,...,n k ), and ye, = (0, . 



,0) 



then 



fj,i(ej,yej) = 1 and /x n (n, s) = } y /x n -l(n ~ ej, s - y n ej). 

3=1 

Therefore, once the /x n (n, s) are all computed, the posterior can be written as 



Hn(ni,Si) JJ [nj\Sjl/(s 0j + nj) s ' +1 ] ir(9, p|x, m, Si) 

(n u Si) 3=1 
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G 



up to a constant, since the complete likelihood posterior only depends on the sufficient statistic 
(ni,5i). 

Now, the closed-form expression allows for a straightforward representation of the marginals. 
For instance, the marginal in Ai is given by 

E ]JI n i ls M a W + n i) Si+1 ^ K + l) 5l+1 A 51 exp{-(m + l)Ai}/m! 

2 

= ^n(n 1 ,Si)]Jn j \S j \/(so j + n j ) s ^ +1 

(ni.Si) i=l 

x (m + l) Sl+1 Af J exp{-(ni + l)Ai}/m! 

up to a constant, while the marginal in A2 is 

2 

M™i,Si) J] (ny! + n,) 5 ^ 1 ) (n 2 + 10) 52+1 Af 2 exp{-(n 2 + 10)A 2 }/n 2 ! 

(ni.Si) i=l 



V- , Jlj ,/>;!*;!/(*o; • (JV + 1)! . N _ ni 

> M^Si) — nvn~Tu TTTt m p 1_p 

^ (iV + 1)! m!(iV-ni)! 

q ifQ a \\'a(-i m \N—u 



again up to a constant, and the marginal in p is 

Sj^/(s 0j + 

(JV+1)! ni!(7V-m) 

V- v- , c , 5i!(5-5i)!p"(l-p) 
"2-, 2^ ( u+1) s 1+ i( n _ u+ io)S-Si+i ' 

u=0Si;ni=u \ / \ / 

still up to a constant, if S 1 denotes the sum of all observations. 

As pointed out above, another interesting outcome of this closed- form representation is that 
marginal likelihoods (or evidences) can also be computed in closed form. The marginal distribution 
of x is directly related to the unormalised weights uj(z) = u(ni, Si) in that 

m(x) = y^u)(z) = y~] fj, n (ni,Si)u)(m,Si) 

z (m,5i) 

= ^ Mm. Si) (]vTi)! ' 

(ni,Si) 

up to the product of factorials l/y\l ■ ■ ■ y n ! (but this is irrelevant in the computation of the Bayes 
factor). 

In practice, the derivation of the cardinalities fi n (ni, Si) can be done recursively as in lFearnhead 

include each observation yk by updating all the /Ufc_i(ni, S\,k—l — n\, 5 2 )s in both /Xfc(ni + 
1, Si + y&, re 2 , £2) and ^k{ n ii Si, n 2 + 1, S 2 + y&), and then check for duplicates. Below is a naive R 
implementation (for reasonable efficiency, the algorithm should be programmed in a faster language 
like C.) 3 where ncomp denotes the number of components: 

#Matrix of sufficient statistics, last column is number of occurrences 
cardin=matrix (0 , ncol=2*ncomp+l ,nrow=ncomp) 



#Initialisation 

for (i in 1: ncomp) cardin[i, ((2*i)-l) : (2*i)]=c(l,dat [1] ) 
cardin [ , 2*ncomp+l] =1 
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#Update 

for (i in 2 : length (dat) ) { 
ncard=dim(cardin) [1] 

update=matrix (t ( cardin) , ncol=2*ncomp+l , nrow=ncomp*ncard , byrow=T) 
for (j in : (ncomp-1) ) { 

update [j*ncard+(l :ncard) , (2*j)+l]= 

update [j*ncard+(l :ncard) , (2*j)+l]+l 
update [j*ncard+(l :ncard) , (2*j)+2]= 

update [ j *ncard+ ( 1 : ncard) , (2* j ) +2] +dat [i] 

} 

update=update [do . call (order , data. frame (update) ) ,] 

nu=dim (update) [1] 

#changepoints 

j j=c(l , (2 :nu) [apply (abs (update [2 :nu, 1 : (2*ncomp)] - 
update [1 : (nu-1) , 1 : (2*ncomp)] ) , 1 ,sum)>0] ) 
# duplicates or rather ncomplicates ! 
duplicates=(l :nu) [~jj] 
if (Iength(duplicates)>0){ 

for (dife in 1 : (ncomp-1) ) { 

j i=j j [j j+dife<=nu] 

ii=j i [apply (abs (update [j i+dif e , 1 : (2*ncomp)] - 
update [j i , 1 : (2*ncomp)] ) , 1 , sum)==0] 
if (length(ii)>0) 
update [ii , (2*ncomp) +1] =update [ii , (2*ncomp) +1] + 

update [ii+dife , (2*ncomp)+l] 

} 

update=update [-duplicates,] 
} 

cardin=update 
} 

At the end of this program, all non-empty realisations of the sufficient (rai, Si) are available in the 
two first columns of cardin, while the corresponding fJ, n (ni,Si) is provided by the last column. 

Once the /U n (ni, Si)'s are available, the corresponding weights can be added as the last column 
of cardin, i.e. 

w=log ( cardin [, 2*ncomp+l] )+apply (If actorial (cardin [, 2* (1 :ncomp) -1] ) , 1 , sum)+ 

apply (If actorial (cardin [, 2* (1 :ncomp)] ) , 1 , sum)- 

apply (log(xi [1 :ncomp] +cardin [, 2* (1 :ncomp) -1] ) * 
(cardin [, 2* (1 :ncomp)] +1) , 1 , sum)- sum (If actorial (dat) ) 
w=exp (w-max (w) ) 
cardin=cbind (cardin , w) 

where xi[j] denotes sqj. The marginal posterior on Ai can then be plotted via 
marlam=f unction (lam, comp=l) { 
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h — 9 


A. 


h — A 


(10,0.1) 


11 


66 


286 


(10,1) 


52 


885 


8160 


(10,10) 


166 


7077 


120,908 


(20,0.1) 


57 


231 


1771 


(20,1) 


260 


20,607 


566,512 


(20, 10) 


565 


100,713 


— 


(30, 0.1) 


87 


4060 


81,000 


(30,1) 


520 


82,758 




(30, 10) 


1413 


637,020 




(40,0.1) 


216 


13,986 




(40, 1) 


789 


271,296 




(40, 10) 


2627 







Tab. 1: Number of pairs (ni, Si) for simulated datasets from a Poisson 'P(A) and different numbers 
of components. (Missing terms are due to excessive computational or storage requirements.) 



sum(cardin [ , 2* (ncomp+1)] *dgamma(lam, shape=cardin [, 2*comp] +1 , 

rate=cardin[,2*comp-l]+xi [comp] ))/sum(cardin[, 2* (ncomp+1)] ) 

} 

lalam=seq( .01,1. 2*max(dat) , le=100) 
mamar=apply (as .matrix (lalam) , 1 ,marlam, comp=l) 

plot (lalam,mamar ,type="l" ,xlab=expression(mu[l] ) ,ylab=" " ,lwd=2) 

while the marginal posterior on p is given through 
marp=function(p, comp=l){ 

sum(cardin [, 2* (ncomp+1)] *dbeta(p , shape l=cardin [, 2*comp-l] +1 , 

shape2=length (dat ) -cardin[,2*comp-l]+l)) /sum (cardin[, 2* (ncomp+1)] ) 

} 

pepe=seq( . 01 , . 99 , le=99) 

papar=apply (as .matrix (pepe) , 1 ,marp) 

plot (pepe,papar,type="l" ,xlab="p" ,ylab=" " ,lwd=2) 

Now, even with this considerable reduction in the complexity of the posterior distribution (to be 
compared with k n ), the number of terms in the posterior still grows very fast both with n and with 
the number of components k, as shown through a few simulated examples in Table [TJ (The missing 
items in the table sim ply took too muc h time or too much memory on the local mainframe when 



using our R program. iFearnheadl 120051 used a specific C program to overcome this difficulty with 
larger sample sizes.) The computational pressure also increases with the range of the data; that is, 
for a given value of (k, n), the number of rows in cardin is much larger when the observations are 
larger, as shown for instance in the first three rows of Table [TJ a simulated Poisson 'P(A) sample of 
size 10 is primarily made up of zeros when A = .1 but mostly takes different values when A = 10. 
The impact on the number of sufficient statistics can be easily assessed when k = 4. (Note that 
the simulated dataset corresponding to (n, A) = (10,0.1) in Table [JJ corresponds to a sample only 
made up of zeros, which explains the n + 1 = 11 values of the sufficient statistic (n\, S\) = (ni, 0) 
when k = 2.) 

An interesting comment one can make about this decompo s ition of the posterior distribution 



is that it may happen that, as already noted in ICasella et al.l (|2004l ). a small number of values 



of the local sufficient statistic (ni,S{) carry most of the posterior weight. Table [2] provides some 
occurrences of this feature, as for instance in the case (n, A) = (20, 10). 
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(n,A) 


k = 2 


k = 3 


k = 4 


(10,1) 


20/44 


209/675 


1219/5760 


(10,10) 


58/126 


1292/4641 


13,247/78,060 


(20,0.1) 


38/40 


346/630 


1766/6160 


(20, 1) 


160/196 


4533/12,819 


80,925/419,824 


(10,0.1, 10,2) 


99/314 


5597/28,206 




(10,1,10,5) 


21/625 


13,981/117,579 


— 


(15,1,15,3) 


50/829 


62,144/211,197 




(20, 10) 


1/580 


259/103,998 




(30,0.1) 


198/466 


20,854/70,194 


30,052/44,950 


(30,1) 


202/512 


18,048/80,470 




(30,5) 


1/1079 


58,820/366,684 





Tab. 2: Number of sufficient statistics (rii,Si) corresponding to the 99% largest posterior 
weights/total number of pairs for datasets simulated either from a Poisson V{\) or from 
a mixture of two Poisson V(\i), and different numbers of components. (Missing terms are 
due to excessive computational or storage requirements. 



We now turn to a minnow dataset made of 50 observations, for which we need a minimal 
description. As seen in Figure [H the datapoints take large values, which is a drawback from a 
computational point of view since the number of statistics to be registered is much larger than 
when all datapoints are small. For this reason, we can only process the mixture model with k = 2 
components. 

If we instead use a completely symmetric prior with identical hyperparameters for Ai and A2, 
the output of the algorithm is then also symmetric in both components, as shown by Figure The 
modes of the marginals of Ai and A2 remain the same, nonetheless. 

2.4 Multinomial mixtures 

The case of a multinomial mixture can be dealt with similarly: If we have n observations rij = 
(flji, . . . , rijk) from the mixture 

n, ~ pMk(dj-,qn, ■ ■ -,qik) + (1 ~ p)M k {d j ; q 2 i, . . -,q2k) 

where riji + • • • + rijf. = dj and qn + • • • + qik = 921 + • • ■ + ?2fc = 1> the conjugate priors on the 
qtjS are Dirichlet distributions (i = 1,2), 

(qn, ... , q ik ) ~ T>(an, ... , a ik ) , 

and we use once again the uniform prior on p. (A default choice for the 1/2.) Note 

that the djS may differ from observation to observation, since they are irrelevant for the posterior 
distribution: given a partition z of the sample, the complete posterior is indeed 

i=l Zj=i i=l h=l 

up to a normalising constant that does not depend on z. 

More generally, if we consider a mixture with m components, 

m 

n i ~ ^PiMkjdj] qn,..., qik) , 
1=1 
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Fig. 1: (top left) Marginal posterior distribution of Ai (top right) marginal posterior distribution of 
A2 (bottom left) marginal posterior distribution of p (bottom right) histogram of the minnow 
dataset. (The prior parameters are 1/100 and 1/200 to remain compatible with the data 
range.) 
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the complete posterior is also directly available, as 

mm m k 



-1/2 

i=l i=l Zj=i i=l h=l 



ri/: - ii ii /;; -mi" 



once more up to a normalising constant. 

The corresponding normalising constant of the Dirichlet distribution being 

v nj=i r («y) 



r(aji H h ajfe) ' 

it produces the overall weight of a given partition z as 

. . IlJ=i r («ij + W)=i^ i^ij + S 2j ) 

n i !n 2 ! w — Z Z^Z x rv — ^ Z Z~^Z ' ( 2 ) 

1 (an H haifc + oi.) I (a 2 i H ha 2 fc + ^2-j 

where n« is the number of observations allocated to component i, Sy is the sum of the n^s for the 
observations I allocated to component i and 



£ S n « 



J Zi=l 

Given that the posterior distribution only depends on those "sufficient" statistics Sij and rii, 
the same factorisation as in the Poisson case applies, namely that we simply need to count the 
number of oc currences o f a par ticular local sufficient statistic (n±, Sn, . . . , Sk m ). The book-keeping 
algorithm of iFearnhead (2005) applies in this setting as well. What follows is a naive R program 
translating the above: 

em=dim(dat) [2] 
emp=em+l 

empcomp=emp*ncomp 



#Matrix of sufficient statistics: 
#last column is number of occurrences 

#each series of (em+1) columns contains, first, number of allocations 
# and, last, sum of multinomial observations 
cardin=matrix (0 , ncol=empcomp+l ,nrow=ncomp) 

Therefore, the (k + l)th column of cardin contains the sum of the djS for the j's allocated to 
the first component. 

#Initialisation 

for (i in l:ncomp) cardin[i , emp* (i-l)+(l : emp)] =c(l ,dat [1 ,] ) 
cardin [ , empcomp+1] =1 



#Update 

for (i in 2 : dim(dat) [1] ) { 



ncard=dim( cardin) [1] 

update=matrix (t ( cardin) , ncol=empcomp+l , nrow=ncomp*ncard , byrow=T) 



for (j in : (ncomp-1) ) { 
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indi= j *ncard+ ( 1 : ncard) 
empj=emp*j 

update [indi ,empj + l] =update [indi , empj + 1] +1 

update [indi ,empj+(2 : emp)] =t (t (update [indi , empj+(2 : emp)] )+dat [i ,] ) 
} 

update=update [do . call (order , data. frame (update) ) ,] 

nu=dim(update) [1] 
#changepoints 

j j=c(l , (2 :nu) [apply (abs (update [2 :nu, 1 : empcomp] -update [1 : (nu-1) , 

1 : empcomp] ) , 1 , sum) >0] ) 
# duplicates or rather ncomplicates ! 
duplicates=(l :nu) [~jj] 
if (length (duplicates) >0) { 

for (dife in 1 : (ncomp-1) ){ 

j i=j j [j j+dife<=nu] 

ii=ji [apply (abs (update [j i+dif e , 1 : empcomp] - 

update [ j i , 1 : empcomp] ) , 1 , sum) ==0] 
if (length (ii)>0) 
update [ii , empcomp+1] =update [ii , empcomp+1] + 

update [ii+dif e , empcomp+1] 

} 

update=update [-duplicates , ] 
} 

cardin=update 

#print (sum(cardin [ , 2*ncomp+l] ) -ncomp~i) 

> 

where dat is now a matrix with k columns. 

The computation of the number of replicates of a given sufficient statistic 

°" = { n l j Sll , • • • j fl m , Si m , ■ • • j Skm) i 

/i n (cr), is then provided by the last column of the matrix cardin. The overall weight is then 
computed as the product of (J, n {a) with the normalising constant ([2]): 

olsums=matrix(0,ncol=ncomp,nrow=dim (update) [1] ) 

for (y in l:ncomp) 
colsums [,y] =apply (update [, (y-1) *emp+(2 : emp)] , 1 , sum) 

w=log( cardin [, empcomp+1] )+ 

apply (If act or ial( cardin [, emp* (0 : (ncomp-1) ) + l] ) , 1 , sum) + 

apply (If actorial ( cardin [ , 

(1 : empcomp) [-1-emp* (0 : (ncomp-1) )] ] - . 5) , 1 , sum) - 

apply (If act or ial( colsums )+em* .5-1,1, sum) - sum (If actorial (dat) ) 
w=exp (w-max (w) ) 
cardin=cbind (cardin , w) 

As shown in Table [3l once again, the reduction in the number of cases to be considered is enormous. 
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(n, dj, k) 


m = 2 


m = 3 


m = 4 


m = 5 


(10,5,3) 

\ 7 7/ 


33/35 


4602/9093 


56,815/68,964 




(10,5,4) 


90/232 


3650/21,249 


296,994/608,992 




(10,5,5) 


247/707 


247/7857 


59, 195/409,600 




(10, 10,2) 


19/20 


803/885 


3703/4800 


7267/11550 


(10, 10,3) 


117/132 


1682/1893 


48,571/60,720 




(10,10,4) 


391/514 


3022/3510 


83,757/170,864 


- 


(10,10,5) 


287/1008 


7, 031/12,960 


35,531/312,320 




(20,5,2) 


129/139 


517/1140 


26,997/45,600 


947/10,626 


(20,5,3) 


384/424 


188,703/209,736 


108,545/220,320 




(20,5,4) 


3410/6944 


819,523/1,058,193 






(20,10,5) 


1225/1332 


9,510/1,089,990 







Tab. 3: Number of sufficient statistics (rij, £V/)i<i<m,i<j<fc corresponding to the 99% largest poste- 
rior, of pairs (nj, Si) corresponding to the 99% largest posterior weights, and total number 
of statistics for datasets simulated from mixtures of m multinomial A4 k (dj;qx, . . . , q k ) and 
different parameters. 

(Missing terms are due to excessive computational or storage requirements.) 



2.5 Normal mixtures 

For a normal mixture, the number of truly different terms in the posterior distribution is much 
larger than in the previous (discrete) cases, in the sense that only permutations of the members 
of a given partition within each term of the partition provide the same local sufficient statistics. 
Therefore, the number of observations that can be handled in an exact analysis is necessarily 
extremely limited. 

As mentioned in Section \2.1\ the locally conjugate priors for normal mixtures are products of 
normal Hj\o~j ~ Af (£j , a 2 / cj) by inverse gamma aj 2 ~ G{aj/2, bj/2) distributions. For instance, in 
the case of a two-component normal mixture, 

X\,..., X n ~ pM{fJLx,a\) + (1 - p)A/"(m2,o"1) , 

we can pick /ii|cri ~ A/"(0, 10cr|), [i2\o~2 ~ A^(l,10cr|), aj 2 ~ £(2,2/<7o), if a difference of one 
between both means is considered likely (meaning of course that the data are previously scaled) 
and if ctq is the prior assumption on the variance (possibly deduced from the range of the sample) . 
Obviously, the choice of a Gamma distribution with 2 degrees of freedom is open to discussion, as 
it is not without consequences on the posterior distribution. 

The normalising constant of the prior distribution is (up to a true constant) 

k 

K(ax, . . . ,a k ,bx, . . . ,b k ,c x , . . . ,c fc ,£i, . . . ,£ k ) = Y\_V^j ■ 

i=X 

Indeed, the corresponding posterior is 

Hj\vj ~ M [{cj£,j + njXj,a 2 /(cj + nj)] 

and 



a, 2 ~ Q { aj + rijj/2, {bj + njo] + (xj - Zj f/iCj 1 + n- x )} 



The number of different sufficient statistics (nj,Xj,aj) is thus related to the numbe r of d i fferen t 



partitions of the dataset into at most k groups. This is related to the Bell number (jRotal . Il964h . 
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which grows extremely fast. We therefore do not pursue the example of the normal mixture any 
further for lack of practical purpose. 
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